Asymptotic distribution of motifs in a stochastic context-free grammar model of RNA folding.
نویسندگان
چکیده
We analyze the distribution of RNA secondary structures given by the Knudsen-Hein stochastic context-free grammar used in the prediction program Pfold. Our main theorem gives relations between the expected number of these motifs--independent of the grammar probabilities. These relations are a consequence of proving that the distribution of base pairs, of helices, and of different types of loops is asymptotically Gaussian in this model of RNA folding. Proof techniques use singularity analysis of probability generating functions. We also demonstrate that these asymptotic results capture well the expected number of RNA base pairs in native ribosomal structures, and certain other aspects of their predicted secondary structures. In particular, we find that the predicted structures largely satisfy the expected relations, although the native structures do not.
منابع مشابه
An Optimized Parsing Algorithm Well Suited to RNA Folding
The application of stochastic context-free grammars to the determination of RNA foldings allows a simple description of the sub-class of sought secondary structures, but it needs efficient parsing algorithms. The more classic thermodynamic model of folding, popularized by Zuker under the framework of dynamic programming algorithms, allows an easy computation of foldings but its use is delicate ...
متن کاملAn optimized parsing algorithm well suited to RNA
The application of stochastic context-free grammars to the determination of RNA foldings allows a simple description of the sub-class of sought secondary structures, but it needs eecient parsing algorithms. The more classic thermodynamic model of folding, popularized by Zuker under the framework of dynamic programming algorithms, allows an easy computation of foldings but its use is delicate wh...
متن کاملRecent Methods for RNA Modeling Using Stochastic Context-Free Grammars
Stochastic context-free grammars (SCFGs) can be applied to the problems of folding, aligning and modeling families of homologous RNA sequences. SCFGs capture the sequences' common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. This paper discusses our new algorithm, Tree-Grammar EM, for deducing SCFG parameters automatical...
متن کاملRNA secondary structure prediction and runtime optimization
1. Background RNA secondary structure Pseudoknots Non-coding RNA 2. CONTRAfold: Probabilistic RNA folding Overview of the algorithm Details of the algorithm Performance of CONTRAfold 3. Other RNA folding methods: Physics-based models and Stochastic Context Free Grammars Physics-based models Stochastic Context Free Grammars Advantages of CONTRAfold over these other approaches 4. How RNA folding ...
متن کاملFree Energy of Stochastic Context Free Grammar on Variational Bayes
Variational Bayesian learning is proposed for approximation method of Bayesian learning. In spite of efficiency and experimental good performance, their mathematical property has not yet been clarified. In this paper we analyze variational Bayesian Stochastic Context Free Grammar which includes the true distribution thus the model is nonidentifiable. We derive their asymptotic free energy. It i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of mathematical biology
دوره 69 6-7 شماره
صفحات -
تاریخ انتشار 2014